The Union government seems to have given into the hype surrounding ChatGPT. A report by The Indian Express, indicated that the Ministry of Electronics and Information Technology was trying to integrate ChatGPT with a Whatsapp chatbot to create a search engine for government schemes targeted at farmers. For the uninitiated, ChatGPT is an artificial intelligence (AI) chatbot created by the US-based research body, OpenAI. Microsoft has the exclusive licence for the underlying technology behind GPT 3, the AI language model that the chatbot is built on.
The chatbot has featured prominently in the news since its launch in November 2022. AI models like ChatGPT are compelling because they can perform a wide range of tasks, such as writing code, essays, or poems. While there are several AI prototypes capable of doing similar tasks, ChatGPT did so with surprising proficiency. Aside from what ChatGPT means for innovation it provides important lessons for India’s path towards fostering domestic AI capabilities.
Datasets Are Abundantly Available
There are two critical components to developing AI – data and computing power. Thus far, India has placed considerable emphasis on the significance of data to enable the growth of AI. Illustratively, the 2019 Draft E-Commerce Policy stated that data is the most critical factor in the success of an enterprise, noting that greater volumes of data directly correlate with better results from artificial intelligence in analysing this data. Further, a stated objective of the 2022 Draft National Data Governance Policy is the creation of large India-centric datasets to catalyse the AI and analytics ecosystem. These statements indicate that access to data is a key policy priority for India, and implies that data is somewhat scarce or hard to come by.
However, evidence suggests that India’s notions about the necessity for vast amounts of data as a catalyst for AI and its scarcity or unavailability for AI development may be misplaced. According to researcher Gwern Branwen, ChatGPT is able to improve its performance despite being trained on a low-quality internet dataset that is small enough to fit on a conventional laptop.
There are also many initiatives making high-quality datasets publicly available to train AI. Illustratively, the not-for-profit Common Crawl created a corpus of millions of gigabytes of data that anyone can access and use. Another, Pile has publicly-available specialised datasets to train AI. It contains data from peer-reviewed academic journals and websites, Wikipedia, and Books3 — a dataset of 196,640 fiction and non-fiction books.
Research by Eleuther AI, the entity that created Pile, found that such specialised datasets that are high quality and diverse contribute to significant improvements in the performance of AI like ChatGPT.
Finally, experts indicate that AI models may soon generate their own training data, rendering the argument for access to it null and void. Researchers have already built AI models that can generate their own data and improve their capabilities by up to 33 per cent!
Also Read: China wants a ChatGPT-like AI chatbot. But it poses challenges to CCP’s censorship regime
India’s supercomputers
Now let us understand the link between AI and computation power. AI requires a considerable amount of computation power to develop. As Aiden Gomez, co-founder of the AI start-up Cohere, puts it more precisely, building AI-like ChatGPT requires supercomputers. A supercomputer is a very high-performing computing machine. According to a report by the Center for Security and Emerging Technology, it would take a laptop thousands of years to create AI like ChatGPT.
ChatGPT was trained on a supercomputer built exclusively for OpenAI by Microsoft. While it hasn’t been publicly benchmarked, the tech giant claims it would rank fifth when stacked against the other supercomputers worldwide.
India is working to bolster its domestic supercomputing capabilities. It launched the National Supercomputing Mission in 2015 to increase domestic supercomputing capabilities. Fifteen supercomputers have been installed since then.
Despite this push, India has no supercomputers in the top 100 supercomputers in the world, and only two in the Top500, as of November 2022. These facts further drive home the point that computing, not data, is the most pressing policy problem for India in AI.
To develop homegrown AI, India must go back to the drawing board to consider how it can improve or bolster current schemes for supercomputing.
For instance, it may consider incentivising greater private sector involvement in supercomputing development, something experts note is a key differentiator between India and other countries. Globally, industry drives supercomputing usage, whereas in India it is driven by research institutions. Experts argue that these institutions may not have the wherewithal to commercialise supercomputing, limiting the scope for investment and further technological advancement.
Given where India currently stands in computing capability, it is unlikely that it will be at the frontier of AI development any time soon. However, if it redirects policy focus away from data and towards the development of computing capability, it will be in a better position to enjoy success in the field of AI.
The author is a Fellow at the Esya Centre and a consultant for Koan Advisory. Views are personal.
(Edited by Theres Sudeep)